149 research outputs found

    Knowledge Unlearning for Mitigating Privacy Risks in Language Models

    Full text link
    Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them with little to no degradation of general language modeling performances; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being orders of magnitude more computationally efficient. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning

    Contextualized Generative Retrieval

    Full text link
    The text retrieval task is mainly performed in two ways: the bi-encoder approach and the generative approach. The bi-encoder approach maps the document and query embeddings to common vector space and performs a nearest neighbor search. It stably shows high performance and efficiency across different domains but has an embedding space bottleneck as it interacts in L2 or inner product space. The generative retrieval model retrieves by generating a target sequence and overcomes the embedding space bottleneck by interacting in the parametric space. However, it fails to retrieve the information it has not seen during the training process as it depends solely on the information encoded in its own model parameters. To leverage the advantages of both approaches, we propose Contextualized Generative Retrieval model, which uses contextualized embeddings (output embeddings of a language model encoder) as vocab embeddings at the decoding step of generative retrieval. The model uses information encoded in both the non-parametric space of contextualized token embeddings and the parametric space of the generative retrieval model. Our approach of generative retrieval with contextualized vocab embeddings shows higher performance than generative retrieval with only vanilla vocab embeddings in the document retrieval task, an average of 6% higher performance in KILT (NQ, TQA) and 2X higher in NQ-320k, suggesting the benefits of using contextualized embedding in generative retrieval models

    Nanosilver Colloids-Filled Photonic Crystal Arrays for Photoluminescence Enhancement

    Get PDF
    For the improved surface plasmon-coupled photoluminescence emission, a more accessible fabrication method of a controlled nanosilver pattern array was developed by effectively filling the predefined hole array with nanosilver colloid in a UV-curable resin via direct nanoimprinting. When applied to a glass substrate for light emittance with an oxide spacer layer on top of the nanosilver pattern, hybrid emission enhancements were produced from both the localized surface plasmon resonance-coupled emission enhancement and the guided light extraction from the photonic crystal array. When CdSe/ZnS nanocrystal quantum dots were deposited as an active emitter, a total photoluminescence intensity improvement of 84% was observed. This was attributed to contributions from both the silver nanoparticle filling and the nanoimprinted photonic crystal array

    Clinical characteristics and mortality of patients with hematologic malignancies and COVID-19: a systematic review

    Get PDF
    OBJECTIVE: Hematologic cancer patients with Coronavirus Disease 2019 (COVID-19) tend to have a more serious disease course than observed in the general population. Herein, we comprehensively reviewed existing literature and analyzed clinical characteristics and mortality of patients with hematologic malignancies and COVID-19. MATERIALS AND METHODS: Through searching PubMed until June 03, 2020, we identified 16 relevant case studies (33 cases) from a total of 45 studies that have reported on patients with COVID-19 and hematologic malignancies. We investigated the clinical and laboratory characteristics including type of hematologic malignancies, initial symptoms, laboratory findings, and clinical outcomes. Then, we compared those characteristics and outcomes of patients with hematologic malignancies and COVID-19 to the general population infected with COVID-19. RESULTS: The median age was 66-year-old. Chronic lymphocytic leukemia was the most common type of hematologic malignancy (39.4%). Fever was the most common symptom (75.9%). Most patients had normal leukocyte counts (55.6%), lymphocytosis (45.4%), and normal platelet counts (68.8%). In comparison to patients with COVID-19 without underlying hematologic malignancies, dyspnea was more prevalent (45.0 vs. 24.9%, p=0.025). Leukocytosis (38.9 vs. 9.8%, p=0.001), lymphocytosis (45.4 vs. 8.2%, p=0.001), and thrombocytopenia (31.3 vs. 11.4%, p=0.036) were significantly more prevalent and lymphopenia (18.2 vs. 57.4%, p=0.012) less prevalent in patients with hematologic malignancies. There were no clinical and laboratory characteristics predicting mortality in patients with hematologic malignancies. Mortality was much higher in patients with hematologic malignancies compared to those without this condition (40.0 vs. 3.6%, p<0.001). CONCLUSIONS: Co-occurrence of hematologic malignancies and COVID-19 is rare. However, due to the high mortality rate from COVID-19 in this vulnerable population, further investigation on tailored treatment and management is required
    corecore